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Abstract. We present a parallel implementation of a map-making algorithm for CMB anisotropy experiments 
which is both fast and efficient. We show for the first time a Maximum Likelihood, minimum variance map 
obtained by processing the entire data stream expected from the Planck Surveyor, under the assumption of 
a symmetric beam profile. Here we restrict ourselves to the case of the 30 GHz channel of the Planck Low 
Frequency Instrument. The extension to Planck higher frequency channels is straightforward. If the satellite 
pointing periodicity is good enough to average data that belong to the same sky circle, then the code runs very 
efficiently on workstations. The serial version of our code also runs on very competitive time-scales the map- making 
pipeline for current and forthcoming balloon borne experiments. 
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1. Introduction 

Making CMB maps out of balloon (e.g. QMAP, de 
Oliveira-Costa et al. 1998; MAXIMA- 1, Hanany et al. 
2000; BOOMERanG, de Bernardis et al. 2000) or space 
borne (MAPQ and the Planck Surveyor^) experiments is 
an important step of the standard data analysis pipeline. 
It allows for: a major lossless compression of the raw data 
with a minimum number of assumptions; checks on sys- 
tematics; tests for and/or estimation of foreground con- 
tamination; etc. Map-making is also important when sim- 
ulating a CMB experiment. It allows to: look for possible 
systematics; optimize the focal plane design and/or the 
experiment scanning strategy; etc. It is then highly de- 
sirable to develop tools able to attack the map-making 
problem also for forthcoming, high-resolution CMB space 
experiments. 

Map-making a la COBE (see e.g. Lineweaver 1994) can 
be extended to the differential, high resolution MAP ex- 
periment (Wright, Hinshaw and Bennett, 1996). Moreover 
Wright (1996) has discussed how to perform map-making 
in the case of one-horned experiments significantly af- 
fected by 1// noise, which introduces spurious correla- 
tions in the data time stream (see e.g. Janssen et al. 1996). 
To our knowledge this algorithm has not yet been imple- 
mented for the Planck Surveyor mission. To date, maps 
for Planck have only been produced by means of de- 
striping algorithms (Delabrouille et al. 1998; Maino et al. 
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1999). We remind that Planck spins at 1 rpm, has a bore- 
sight angle of 85° and observes the same circle of the sky 
for 1 hour (every hour the spin axis is moved, say along the 
ecliptic, by 2.5'). The destriping algorithms implemented 
for Planck assume that, after averaging data taken along 
a given circle (i.e. in one hour), the residual noise along 
the circle is well approximated by white noise plus an off- 
set. These offsets arc determined by minimizing the set of 
all possible differences between the antenna temperatures 
of the same sky pixel observed in two different scan cir- 
cles. This method rests on a number of assumptions, the 
stronger being that the knee frequency of the 1// noise is 
smaller than or at most comparable to the spin frequency. 
Wright's method does not suffer this limitation because it 
assumes to know the statistical properties of the noise, di- 
rectly derived from the data themselves (see e.g. Ferreira 
and Jaffe, 2000; Prunet et al. 2000). 

The purpose of this paper is to present the first imple- 
mentation of Wright's method to the Planck Surveyor, 
for the moment assuming a symmetric beam profile. In 
fact, we want to show, to our knowledge for the first 
time, the analysis of the entire time stream (14 months) of 
Planck simulated data to produce Maximum Likelihood, 
minimum variance CMB maps (we stress that maps ob- 
tained from destriping algorithms are not necessarily min- 
imum variance). The parallel, Message Passing Interface 
(hereafter MPI) implementation of the algorithm has been 
tested on a SGI Origin 2000 (hereafter 02K) and runs on 
time scales that might render Monte Carlo simulations 
feasible. This opens up the possibility of evaluating the 
CMB angular power spectrum via Monte Carlo techniques 
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(Wandclt, Hivon & Gorski 2000) rather than by maximiz- 
ing a Likelihood (see e.g. Bond, Jaffe & Knox 1998) or 
by directly evaluating the map angular correlation func- 
tion (Szapudi et al. 1999). We want to stress that our 
implementation does not assume any exact periodicity of 
sky pointing within single circle observation (i.e. no aver- 
age on the circle is performed, as instead required by de- 
striping algorithms) . If, on the contrary, it turns out that 
pointing periodicity is a good assumption (i.e. average on 
circles is performed) then the same code runs very effi- 
ciently on medium sized workstations (e.g. Pentium based 
PCs), again in quite short time scales. 

Although this paper might seem rather technical, we 
think that it can be of interest to a large community of 
CMB data analysts, involved either in the Planck col- 
laboration or in the forthcoming, new generation balloon 
experiments. 

The plan of this paper is as follows. In Sect. 2 and 3 
we will briefly discuss the method and its implementation. 
In Sect. 4 we will show tests and benchmarks of our im- 
plemented software. In Sect. 5 we will apply our tools to 
Planck and BOOMERanG simulated data. Finally, in 
Sect. 6 we will briefly review our conclusions. 

2. Method 

2.1. Statement of the problem 

Let us for completeness outline here the map-making algo- 
rithm and its assumptions. The primary output of a CMB 
experiment are the Time Ordered Data (TOD), d, which 
consist of Afd sky observations made with a given scan- 
ning strategy and at a given sampling rate (three points 
per FWHM, say). A map, m, can be thought as a vec- 
tor containing J\f p temperature values, associated with sky 
pixels of dimension ~ FWHM/3. Following Wright (1996) 
we assume that the TOD depend linearly on the map: 

d = Pm + n, (1) 

where n is a vector of random noise and P is some known 
matrix^ The rectangular, Afd x M p matrix P is dubbed 
a pointing matrix. That is, applying P on a map "un- 
rolls" the latter on a TOD according to a given scanning 
strategy. Conversely, applying P T on the TOD "sums" 
them into a map^. The structure of P depends on what 
we assume for m. If m contains a pixelized but unsmeared 
image of the sky then P must account for beam smearing. 
This is the most general assumption and allows one to 
properly treat, for instance, an asymmetric beam profile. 
In this case, applying P to m implies both convolving the 
sky pattern with the detector beam response and unrolling 

3 Note that the elements of m need not to be sky tempera- 
ture values. Other parameters may be fitted provided that they 
also depend linearly on the TOD. Sometimes these parameters 
are called "virtual" pixels. 

4 The value of a pixel of this map is the sum of all the ob- 
servations of that pixel made at different times according to a 
given scanning strategy. 



m into a "signal-only" time stream. If, on the other hand, 
the beam is -at least approximately- symmetric then it 
is possible -and certainly more convenient- to consider m 
as the beam smeared pixelized sky. The structure of P 
for a one-horned experiment would then be very simple. 
Only one element per row would be different from zero, 
the one connecting the observation of j-th pixel to the i-th 
element of the TOD. Hereafter we will restrict ourselves 
to this case and we will address the implementation of an 
asymmetric beam profile in a forthcoming paper. 

2.2. Least Squares Approach to Map-making 

Many methods have been proposed to estimate m in Eq. 
(|l|) [for a review see, e.g. Tegmark (1997)]. Since the prob- 
lem is linear in m, the use of a Generalized Least Squares 
(GLS) method appears well suited. This involves the min- 
imization of the quantity 

X 2 = n T Vn = (d T - m T P T )V(d - Pm) 

for some nonsingular, symmetric matrix V. Deriving with 
respect to m yields an estimator, say m, for the map: 

m = (P T VP) _1 P T Vd (2) 

The proof that this estimator is unbiased is straightfor- 
ward. Just note that: 

m m (P T VP) 1 P T Vn. 

So, provided that (n) = 0, we have (m) = m (the symbol 
(•) indicates, as usual, an average over the ensemble). The 
map covariance matrix is, then: 

IT 1 = ((m- m)(m T - m T )) = 

= (P T VP)" 1 P T V(nn T )VP(P T VP)" 1 

In order to have a "low noise" estimator, one has to find 
V that minimizes the variance of rh. This is attained if 
we take V to be the noise inverse covariance matrix, i.e. 
V -1 = N = (nn T ). Then rh has the very nice property 
of being, amongst all linear and unbiased estimators, the 
one of minimum variance 0. The GLS solution to the map- 
making problem is, then: 

m = E -1 P T N -1 d (3) 
where 

S = P T N _1 P (4) 

To see why this is the case, remember that any linear esti- 
mator can be written in the form rh = Ad = APm + An. So, 
the condition AP = I must hold for it to be unbiased. To prove 
that the variance of rh can not be smaller than the variance of 
rh, it helps (see, e.g. Lupton 1993) to write rh = rh + (rh — rh). 
It follows that 

Var(rh) = Var(rh) + Var(rh — rh) + 2Cov(rh, rh — rh). 

By direct substitution, one finds that Cov(rh, rh — rh) = 
(TP - I)(P T N" 1 P)" 1 = 0. Thus, the variance of rh is equal 
to the variance of rh plus a nonnegative quantity, which is all 
we need to show. 
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2.3. ML Solutions 

The statistical properties of detector noise are, as usual, 
described by a multivariate Gaussian distribution. This 
fact has two consequences. The first one is rather obvious: 
if the noise distribution is Gaussian, rh is indeed the ML 
estimator. In fact, in the Gaussian case, the Likelihood of 
the data time stream given the (true) map is 

£(d|m) = {2tt)- m ^ 2 x 

exp j-i[(d T - m T P T )N- 1 (d- Pm) + Tr(lnN)] j (5) 

Solving for the maximum would clearly give Eq. (|J) and 
Eq. (g) again. The second remark has to do with the notion 
of a lossless map, that is, a map which contains all the 
relevant cosmological information contained in the TOD. 
A good way to prove that a method is lossless is to show 
that the TOD and the related map have the same Fisher 
information matrix. The estimator defined in Eq. ^ leads 
to a lossless map (Tegmark 1997). 

3. Implementation 

The method outlined in Sect. || has the advantage of be- 
ing simple, linear and, hence, computationally appealing 
for Planck. In this Section we discuss our implementa- 
tion of the map-making algorithm to the 30 GHz channel 
of the Planck Low Frequency Instrument (LFI), with a 
nominal resolution ~ 30' FWHM. This implies Af d ~ 10 9 
(14 months of observation, i.e. about two sky coverages) 
and Af p ~ 10 6 . Note, however, that the implementation we 
present here is rather general since the algorithm does not 
take specific advantage of the details of Planck's scan- 
ning strategy and instrumental performances. In fact, we 
want to stress that our implementation of Wright's al- 
gorithm is, in principle, well suited for any one-horned, 
balloon- or space-borne CMB experiment. 

3.1. The Noise Covariance Matrix 

As already mentioned in the Introduction, the GLS 
method assumes to know the statistical properties of the 
noise. They are fully described by N, the noise covariance 
matrix in the time domain. It is of course highly desir- 
able to estimate N directly from the data since there is 
no a priori guarantee that the noise statistical properties 
match with what is measured during ground testing. The 
measured TOD is a combination of signal and noise. Thus, 
one has to estimate the signal, subtract it from the TOD 
and only then estimate N. Ferreira and Jaffe (2000) and 
Prunet et al. (2000) discuss iterative methods to attack 
this issue. 

The problem of evaluating the noise properties directly 
from the TOD is a bit far from the main point we want to 
make here: the possibility, given N, to analyze the entire 
Planck simulated data set to produce a ML, minimum 
variance map. In any case, we will show in Sect. ^jthat for 
the Planck case, to zero-th order, we can estimate the 
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signal by projecting the TOD onto the sky. This implies 
applying P T [summing the TOD into a map], (P T P) _1 
[dividing the pixel values by the number of hits] and P 
[unrolling a map into a time stream] to d. The noise esti- 
mator, n, can then be written as follows: 

n = d P(P T P)- 1 P T d = [I - P(P T P)- 1 P T ]n (6) 

and does not include any contribution from the signal. In 
a forthcoming paper (Natoli et al. 2001) we will address 
in more detail, and specifically for the Planck Surveyor, 
the problem of estimating the noise statistical properties 
directly from flight data. Here, in Sect. [|, we will only show 
that n allows a reasonably good estimate of the "in-flight" 
noise behavior. 

3.2. Stationary noise 

It is customary to assume that the statistical properties 
of detector noise do not change over the mission life time. 
The formal way to state this property is to write the ele- 
ments of the noise covariance matrix as follows: 

Nij = tQi-3\) (7) 

or, equivalently, to say that N is a Toeplitz matrix. If £ 
vanishes for \i — j\ > A/j <C Afd, then N is band diagonal 
and it can be well approximated by a circulant matrix: 

N i+1>j+1 = N i:j mod A/d- (8) 

Obviously, circularity does not hold exactly for N: there 
is no physical reason for the noise correlation function to 
wrap around itself past the edges. However, the number 
of elements to modify in order to turn N into a circulant 
matrix is a mere A/^(A/^ + 1) <c A/J. A circulant matrix 
is diagonal in Fourier space: N = F^HF, where F (F^) 
is the discrete Fourier transform (antitransform) operator 
(DFT), and H is diagonal. It is trivial to show that the 
inverse of a circulant matrix is still circulant and, in par- 
ticular, that N _1 = F^H _1 F. Circularity ensures that we 
will not have edge problems: this is exactly the boundary 
condition imposed by the DFT. 

3.3. Non-stationary noise 

The ML solution of the map-making problem [see Eq.{j3j) 
and Eq. (g)] can be easily generalized to the case of piece- 
wise stationary noise. In this case, the whole TOD can be 
split into a set {d( K )} of smaller pieces k = 1, N c . Clearly, 

d(«) = P(«)m + n (re) , (9) 

where P( K ) is a pointing matrix and ri( K ) is a vec- 
tor of random, stationary noise with covariance ma- 
trix N( K j. Neglecting cross-correlations between different 
ri( K )'s (a reasonable assumption if A/$ <C Afd) reduces the 
Likelihood given in Eq. (||) to the product of the Likelihood 
of the different pieces, which of course do not need to share 
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the same noise covariance matrix. Thus, in this case, the 
ML map is given by 



m 



E P W N W P M 



(10) 



Since neglecting the correlation between different n/ re )'s 
is de facto equivalent to treating different sections of the 
same TOD as independent experiment outcomes, we can 
obviously think of Eq. (^) as a simple recipe to produce 
maps from Af c different horns whose sky coverage is at 
least partially overlapped. 

3.4. The Noise Inverse Covariance Matrix 

In what follows we do not need to explicitly use N _1 , 
although we have in principle all the relevant information 
to do so. In fact, the first row of N" 1 , usually called a 
noise filter in the literature, can be computed by taking 
a DFT of H _1 , the noise inverse spectral density. Then, 
by exploiting the circularity of N _1 we could in principle 
reconstruct the whole matrix. Needless to say, handling a 
Afd x Afd matrix is, for the Planck case, unconceivable 
even for the 30 GHz case. 

Fortunately, under the assumption of circularity the 
application of the noise inverse covariance matrix to the 
TOD is simply a convolution, 



N _1 d = F t S _1 Fd, 



(11) 



and we do not need to store a huge matrix. 

Asking how long a noise filter should be is a very im- 
portant question. In fact, from the one hand, it is desirable 
to convolve the TOD with a filter properly sampled in Afd 
points (something already numerically feasible even for a 
PLANCK-like TOD). On the other hand, Eq. (O) works 
for real data in the limit Af% <C Afd (so that N -1 can be 
considered circulant). Also note that an obvious bound 
on Af$ comes from the instrument: it is difficult to imag- 
ine that the Planck detectors will remain "coherent" for 
the whole mission life-time. A further benefit in having 
A/j <C Afd is the possibility of performing a piecewise con- 
volution in Eq. (pi]). In any case, in Sect. || we will show 
the sensitivity of our results to Af$ ■ 



3.5. The Conjugate Gradient Solver 

Evaluating m requires solving Eq. ([ 
nicnce we rewrite as 



H- 1 P T N" 1 (Prh) 



H _1 P T N _1 d 



that for convc- 



(12) 



where H _1 is a preconditioning matrix (or precondi- 
tioner). This linear system can be solved by means of a 
Conjugate Gradient (CG) minimization technique (see e.g. 
Axelsson & Barker, 1984). CG methods are known to con- 
verge efficiently [i.e. in less than the 0(AfS) operations 
required by standard matrix inversion] if the eigenvalues 
of S spawn a few orders of magnitudes. If this is not the 



case, the method should be preconditioned. This implies 
finding a matrix H such that its inverse approximates^ 
The method then solves the equivalent system given 
in Eq. (p"2"|). Other than being a good approximation to 
the original matrix, the preconditioner should also be fast 
to compute and invert. These often are conflicting require- 
ments: finding a good preconditioner may be hard. In our 
case, we find more than satisfactory to precondition our 
system with the diagonal part of the symmetric and pos- 
itive definite matrix S. This choice is motivated by the 
fact that X is diagonally dominant. Furthermore, the di- 
agonal part of S is very close to the number of hits per 
pixel P T P so in practice we are normalizing each row of 
the inverse covariance matrix to the "redundancy" of the 
corresponding pixel^. 

A CG algorithm does not need to explicitly invert S. 
We want to stress that earlier methods implemented to 
solve the map-making problem for balloon-borne exper- 
iments (Borrill, 1999) require the inversion of the same 
matrix, surely an unpleasant task when considering large 
maps. Such implementations usually rely on the fact that 
S is (analytically speaking) positive definite and, as such, 
a candidate to be Cholesky decomposed in 0(Afp) oper- 
ations. This fact makes the procedure prohibitively ex- 
pensive -even for present day supercomputers- when Af p 
becomes large (> 10 5 ). Also, £ may not be exactly pos- 
itive definite due to small inaccuracies when estimating 
the noise correlation properties, making Cholesky decom- 
position critical. Note that using an iterative solver could 
potentially reduce the above operation count by a factor 
Afp /Af iter- However, storage requirement for X would still 
be 0(Afp ), prohibitive for Planck 

To conclude this Section, let us stress that the map 
we obtain, m, has the correct noise covariance matrix, S, 
even if we never evaluate nor store it directly. 



3.6. Parallelization 

The scheme outlined above naturally lends itself to a 
multi-processor environment. In fact, it is straightforward 
to divide the entire TOD in a set of smaller, partially 
overlapped, time streams and force each of the Processor 
Elements (PE) to perform map-making with its own time 
stream. This is accomplished by explicitly assigning work 



6 To be quantitative, an optimal preconditioner would give 
H~ £ = I + R with all the eigenvalues of R less than 1 
(Axelsson & Barker, 1984). It is then clear that multiplying S 
by the preconditioner changes its spectrum to give an equiva- 
lent but better conditioned system. 

7 One other problem should be mentioned: E is remarkably 
ill conditioned. In fact, any monopole term belongs to its ker- 
nel since applying N _1 kills the corresponding TOD offset. 
Our preconditioner does not cure this problem which is instead 
fixed by specifically imposing that rh has a given (usually zero) 
average. This prescription, critical if one attempts a matrix in- 
version, is not needed in our case: we can, of course, always 
subtract out the monopole. 
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loads to each PE by means of MPI calls embedded in the 
Fortran 95 code. 

As stated above, the working assumption that vali- 
dates Eq. ( |ll| ) is that -C M&- Since the number of 
PE, AfpE, is at most 0(100) the previous condition is 
even fulfilled for the single PE. Thus, each PE performs 
the convolution of Eq. ( pT|) b y executing its own serial 
FFT as discussed in Sect. 3.4. This is more efficient than 



spreading over the PE's an intrinsically parallel FFT, the 
advantage being that we limit inter-processor communi- 
cations. However, as a drawback, each PE has to handle a 
partial map of size Af p and cross talk among PE's is nec- 
essary to merge AfpE of these maps at the end of each CG 
iteration. Note that increasing the number of processors 
Mpe while keeping Md constant may result in the opera- 
tion coun t being dominated by this merging (see however 
Sect. U below). 

4. Performances 

In this Section we benchmark an implementation of the 
map-making algorithm described above. The benchmarks 
are produced within the framework of the Planck's simu- 
lation pipeline. However, since the algorithm does not take 
specific advantage of Planck's scanning strategy and in- 
strumental performance, they are indeed very general. To 
show that this is the case we will apply our code t o a 
simulation of the BOOMERanG experiment (see Sect. |5.3| 
below). 
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4.1. Scaling with J\f^, J\f d and N p 

Part of the l.h.s. of Eq. (|l2|) can be thought as the applica- 
tion of the P T N _1 operator (the same acting on d on the 
r.h.s.) to Prh^- 1 , the data stream obtained by unrolling 
the temptative solution produced by the CG at the k-th 
step. While the application of P T N _1 to d is done just 
once, the application of the same operator to Pm' 1 ' must 
be done for every iteration of the CG algorithm. Therefore, 
speed of execution becomes critical. 

If we use a radix-2 Fast Fourier Transform (FFT) to 
perform a piecewise convolution of the TOD, the oper- 
ation count is expected cx Md log 2 Af% ■ The lion share of 
CPU time is taken by the (real) DFT transform. The use 
of an efficient FFT library greatly speeds up the calcula- 
tion. We used the publicly available FFTW library (Frigo 
& Johnson 1998). This code has the nice property of tai- 
loring itself to the architecture over which is executed, 
therefore greatly enhancing cross-platform efficiency. In 
Fig. [I] (upper panel) we plot CPU time per CG iteration, 
r, versus A/£. The behavior shown in the figure is due to 
the non constant efficiency attained by the FFTW library 
when performing transforms of different lengths. FFT rou- 
tines are usually more efficient when processing power-of- 
two transforms and FFTW is no exception. Thus, we fix 
A/f to a power of two. 

The application of P (P T ) to a map (TOD) is expected 
to scale linearly with Md- In Fig. |l| (middle panel) we give 
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Fig. 1. The wall-clock time r(s) on a single 500 MHz 
Pentium III CPU per CG iteration plotted vs. the noise 
filter length, A/^, vs. the TOD length, Ad and vs. the map 
pixel number, M p . 



t as a function of Ad for a given map size [M p — 786 432, 
as expected by choosing A4ide = 256 in the HEALPix 
pixelization (Gorski et al. 1999)] and the Planck baseline 
scanning strategy. The scaling with Md is indeed linear: 
t = [(A/d/l,400) + 50]ms. 
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Since we never build up the N v x N p inverse covari- 
ance matrix £, but rather perform the piecewise multipli- 
cation P T N^ 1 (Pm) each time we need to do so, this op- 
eration is not expected to scale with the number of pixels. 
However, multiplying by the preconditioner, H , does. 
When N p < Nd (a condition usually fulfilled for a CMB 
experiment) the preconditioner should produce a negligi- 
ble effect on the operation count. Conversely, we expect 
an increase in the operation count as, keeping Nd fixed, 
N p increases. This is confirmed in Fig. [l] (lower panel). 
Note that, for the last two points in the plot, N p ~ Nd- 
Another factor that should in principle contribute to the 
scaling of r with N p is the performance overhead result- 
ing from handling the arrays containing the maps. Note, 
however, that r increases by only a factor ~ 2 when N p is 
boosted by a factor > 200. 

All the scalings given in Figjj] have been obtained from 
a single processor job using a 500 MHz Pentium III CPU. 

4.2. Scaling with Npe 

In order to test our parallel software we use the following 
simulated data sets (relative to the LFI 30 GHz chan- 
nel), for which Nd changes by a couple of orders of mag- 
nitudes: V60, the time stream obtained by averaging on 
circles the full Planck TOD (N d = 20, 196,000); V3, a 
fictitious time stream obtained assuming that each circle 
is observed only 20 times (N d = 403, 920, 000); V, the full 
Planck TOD (N d = 1,211,760,000). In the ideal case, 
the algorithm speed r _1 should scale linearly with Npe 
and should be inversely proportional to Nd- So, we can 
define an efficiency index 



8 = 1.^ 



N 



PE 



(13) 



that we measure on an 02K. For 7- > 60, increasing Npe 
from 1 to 8 results in a loss of efficiency of ~ 50% due 
to inter-processor communications. If we keep Npe = 8 
but change "P60 to "P3 and V, £ increases. We find that £ 
is basically the same when comparing V60 and Npe = 1 
with V and Npe — 8. Thus, for these cases, our code 
shows the ideal scaling given in Eq. 

4.3. Convergence criterion and precision 

A natural criterion to stop the CG solver is to require that 
the fractional difference 



|P T N- 1 Pm( fc ) - P^-^l 
IIP^N-MII 



(14) 



obtained at the fc-th iteration step is less than a supplied 



tolerance. The norm 
Euclidean norm[| 



in Eq. 



14j) represents the usual 



An alternative convergence criterion is the more stringent 
Loo norm, i.e. the absolute value of the largest element of a vec- 
tor. We have tested that, for our applications, the CG solver 
returns almost identical results using either the Euclidean or 




Fig. 2. Solver precision as a function of the CG iteration 
step. Solid and dotted curves refer to the parallel code 
running on the full V TOD (no average on circles) for 
"signal-only" and "noise-only" TOD, respectively. Dot- 
dashed and dashed curves refer to the serial code running 
on P60, again for "noise-only" and "signal-only." 

To test the CG convergence efficiency we plot in Fig. || 
for two datasets of different length: N d = 20, 196, 000 
(corresponding to 7>60) and N d = 1,211,760,000 (corre- 
sponding to V). Both datasets have been benchmarked for 
the cases of "signal-only" and "noise-only" TOD. In the 
latter case, the noise properties are kept independent of 
the TOD length to facilitate comparisons. Note that, af- 
ter a few steps, the precision decreases exponentially 
and drops to the 10 -6 level in a few tens of iterations. 
Although always exponential, the rate of convergence of 
the CG solver depends on the TOD length. This is a con- 
sequence of having kept fixed N% and N p while increasing 
Nd- In fact, the number of elements of X connected by the 
noise filter is lower for a longer TOD, a fact that makes 
this matrix more efficiently preconditioned by its diagonal 
part. 

To quote a single number, the algorithm converges, to 
10 -6 precision, in about 90 minutes (Npe = 8) to produce 
a map out of the entire (not averaged over circles) 14 
months time stream, expected from the Planck Surveyor 
@ 30 GHz. 

Of course, solver precision and accuracy (that is, "dis- 
tance" from the true solution) are not the same thing. In 
other words, being confident that the algorithm converges 
quickly is only half the story. We also want it to converge 
to the exact solution. This is a rather important issue be- 
cause we would not like the algorithm to kill modes in the 
map. Discussion of this point is deferred until the next 
Section. 



5. Numerical results 



Loo norm. However, in the latter case, convergence is signifi- 
cantly slower. 
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5.1. "Signal-only" 

In the limiting case of a noiseless TOD (i.e. d = Pm) 
Eq. (||) and Eq. (|4|) yield rh = m. Thus, any good map- 
making code should return the "true" , pixelized sky when 
fed with a "signal-only" TOD. We tested our code with 
a pure CMB anisotropy pattern ^ and the same CMB 
pattern contaminated by the foregrounds most relevant at 
30 GHz 0. The following scheme was employed to prepare 
the TOD. First, a "signal map" is built, pixelized at ~ 
FWHM/3 using the HEALPix pixelization {Af sidc = 256) 
and FWHM-smeared. The latter is "read" by simulating a 
30 GHz Planck/LFI time stream. This TOD is the input 
to the map-making code. 

In order for the code to run a noise filter, N _1 , is 
needed. First note that for this noiseless case the filter 
should, in principle, be irrelevant: as stated above, the 
GLS solution trivially gives back the "true" map indepen- 
dently of N _1 . We must keep in mind, however, that we 
only consider Af^ elements of the filter. In Fourier space 
this translates into estimating the noise inverse spectral 
density precisely at A/j frequencies, the lowest, / m i n say, 
being Af$ times smaller than the sampling frequency. One 
obvious limitation on A/*£ is that it must be fixed accord- 
ingly, so that the Fourier representation of N _1 well com- 
prises the spin frequency. The amplitude of modes with 
/ < /min, if any, is not recovered unless the correspond- 
ing frequencies are considered in the analysis. However, 
there are two reasons why one should not worry about 
this effect: first, and more importantly, it is small and be- 
comes utterly negligible for £ > a few. Furthermore, as 
previously stated, this contribution is likely to get lost 
in any case since it falls in a region of the time Fourier 
spectrum strongly dominated by the noise and/or, more 
realistically, suppressed by hardware-induced decoherence 
(e.g. instrumental high passing). We show (for the typical 
case Af% = 4096) the angular power spectra of the CMB 
only input and output maps together with their difference 
(lower panel of Fig. [|). It is clear that our map-making 
code recovers, as expected, the input power spectrum to 
an impressively high precision over the whole range of 
explored multipoles (although with a somewhat coarser 
accuarcy at low ts). The same happens when consider- 
ing the case of CMB plus foregrounds (see Fig. [|, upper 
panel): the level of the residuals between the input and 
output map is remarkably low, underlying the high accu- 
racy of our code in recovering the input signal even in the 
presence of sharp features (see Fig. ^, lower panel) . 

As shown above, the code also recovers to very high 
precision even the broadest features in the map, i.e. the 
ones corresponding to low ts. We want to stress that the 
CG solver converges to the corresponding modes in a very 
reasonable number of iterations (< 50, see Fig. ||). Other 
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Fig. 3. Upper panel: map-making output on "signal-only" 
TOD (CMB); lower panel: angular power spectrum of the 
input map (dark blue line) , together with the spectrum of 
the map given in upper panel. The two are virtually indis- 
tinguishable. The light blue line below shows the spectrum 
of the difference map. 

Fig. 4. Upper panel: map-making output on "signal- 
only" TOD (CMB and main foregrounds). Lower panel: 
Difference between the above and input maps. 



map-making codes, implementing different solving algo- 
rithms, seem not to share this property (Prunet et al. 
2000) requiring more sophisticated methods to speed up 
the convergence of low multipoles and/or a change of vari- 
able to solve for the featureless noise-only map (see Dore 
et al. 2001). 

A final comment before concluding this Section. As 
mentioned at the end of Sect. 4.3, precision and accuracy 



We use the standard Cold Dark Model with Q,cdm = 0.95, 
Qb = 0.05 and h — 0.5. No dipole was included. 
10 Specifically, synchrotron and free free emission were in- 
cluded as well as simulated emission from SZ clusters and mil- 
limetric bright sources. 



are not the same thing. We want to be sure that the CG al- 
gorithm converges to the "true" solution. So, from a given 
"signal-only" TOD we generate different maps by chang- 
ing the solver precision. For each of these maps we evaluate 
the maximum difference in pixel value between them and 
the input map. The results are shown in Fig. ^| where we 
plot this maximum difference, normalized to the norm 
of the input map, as a function of the required solver pre- 
cision e. There are two regimes separated by e ~ 10 -8 : the 
map accuracy decreases when e gets larger and saturates 
when e gets smaller. So, e ~ 10~ 8 is just about the right 
number to have a good compromise between speed of con- 
vergence and accuracy. In any case, any residual map error 
smaller than ~ 1 /zK is acceptable in most applications, 
given the Planck sensitivity goal. 

5.2. "Noise-only" 

In the opposite limiting case of a "noise-only" TOD (i.e. 
m = 0), the map-making solution [cf. Eq.(|J) and Eq.(|J)] 
gives rh = (P T N- 1 P)- 1 P T N- 1 n. The efficiency of a 
map-making code can be tested, in this case, by examining 
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Fig. 5. Maximum deviation between input and output 
maps in the "signal-only" case, as a function of the solver 
precision. Here A/£ = 2 11 (triangles) and A/^ = 2 17 
(squares) . 

the quality of the map obtained, a natural figure of merit 
being the variance of the reconstructed map or -better- its 
angular power spectrum. 

We generate a noise time stream assuming the follow- 
ing form for the noise spectral density: 



p(f)=A[i+(\f\/f k r] 



(15) 



where is the knee frequency. We choose = 0.1 Hz (the 
Planck goal), a = — 1 and an amplitude A corresponding 
to the expected white noise level of the 30 GHz receivers. 
The minimum and maximum frequencies are set by the 
inverse of the mission life-time and the sampling frequency 
(33 Hz) . We then generate a "noise-only" data stream that 
is the input to the map-making code. 

Fig. ^ shows the map obtained by just averaging dif- 
ferent observations of the same pixel (upper panel), the 
ML solution (middle panel) and the angular power spec- 
tra of the two (lower panel). It is clear from the mid- 
dle panel of this figure that the map-making algorithm 
strongly suppresses the stripes due to 1// noise, very vis- 
ible in the upper panel. This is confirmed by the angular 
power spectrum of the ML map (see lower panel of Fig. ^|) 
which is basically flat (as expected in the case of white 
and isotropic noise) for I > 100. The increasing power at 
£ < 100 is due to two effects: residuals of spurious corre- 
lations in the noise and nonuniform sky coverage due to 
the Planck scanning strategy. In spite of that, the overall 
level of instrumental noise has been considerably lowered 
by the map-making algorithm. 

The noise filter or, equivalently, the inverse of the 
noise spectral density, has been evaluated as explained 



in Sect. 3.1. In order to assess how good n is as a noise 
estimator, we use the same "noise-only" TOD to pro- 
duce two maps: the first, by using the "true" theoretical 
noise spectral density [cf. Eq.(|l5|)1; the second, by using 
the noise spectral density estimated with n. In both cases 
A/^ = 4096 as for the map in Fig. |(| In Fig. we show, 



Fig. 6. Upper panel: coadded map made out of a "noise- 
only" TOD. Note the stripes. Middle panel: map-making 
solution on "noise-only" TOD. Lower panel: angular 
power spectrum of the maps given above (black and blue 
curves, respectively). 



as a function of £, the percentage difference between the 
spectra derived from the two maps. So, although not op- 
timized to the specific case of the Planck Surveyor, the 
noise estimator n reproduces to quite a good extent the 
in-flight noise properties. In any case, as also shown in 
Fig. ^, the uncertainties introduced in the estimate of the 
noise angular power spectrum are less than those due to 
cosmic variance. 

As already discussed in the previous Sections, the 
length of the noise filter is an important issue to care- 
fully address. One should expect that in the "noise-only" 
case the dependence on A/j could be stronger because of 
the 1// tail in the noise power spectrum. To address this 
issue in detail and to show the sensitivity of our results 
to A/j, the following scheme is employed: we generate the 
"true" , fully correlated noise time stream, accordingly to 
Eq. (|l|). Then, we do map-making using a noise filter 
with A/j = A/"d/2, and evaluate the r.m.s. of this map, 
Then, we generate a set of maps out of the same fully 
correlated noise time stream, but imposing a noise filter 
length, A/"f, shorter and shorter w.r.t. A/d/2. In Fig. || we 
plot the fractional variation of the map r.m.s. w.r.t. a 
as a function of M^. It is clear from the figure that, for 
the Planck Surveyor, we can lower by a factor of ~ 
thousand w.r.t. A/d/2 and change the final r.m.s. of the 
map by much less than one percent. This shows that in- 
creasing A/^ above a given threshold does not induce any 
significant difference. This result holds even if we consider 
knee frequencies significantly higher than /& = 0.1 Hz: in 
Fig. H we also plot the effect of having /j. = 1 Hz and 



11 Note that we simulate the whole "noise-only", Md long, 
time stream by using Eq. (^) and standard FFT techniques. 
This time stream is obviously periodic. Thus, N -1 is strictly 
circulant even for Md = A/j/2. The same would not be true for 
real data. 



P. Natoli et al.: A Map-Making Algorithm for the Planck Surveyor 



9 




Fig. 7. Percentual difference, as a function of I, of the 
power spectra of the maps obtained using the true noise 
properties and the noise estimator n discussed in the text. 

fk = 10 Hz, surely two very pessimistic assumptions in 
the Planck context. 

In principle, the noise properties of the Planck de- 
tectors, as described by Eq. (|l5|), suggest the existence of 
spurious correlations in the TOD over the whole mission 
life-time. In practice, the error introduced by neglecting 
them is negligible, at least when the r.m.s. is used as a 
figure of merit. To stress this point even further, we evalu- 
ate the angular power spectrum from two maps generated 
with the same input (i.e. the same TOD), but using noise 
filters of different lengths. Again, we plot the percentual 
difference between the spectrum obtained with A/j — Md/2 
and the spectra obtained with noise filters shorter by a 
factor of 64 and 1024, respectively. Fig. || shows that this 
difference is below ~ 1% for £ > 100, well under cosmic 
variance in the region of interest for recovering the cosmo- 
logical parameters. 

To conclude this section let us note that the algorithm 
discussed here is intrinsically linear [cf. Eq.([j])] since the 
map estimator can be written as the "true" map plus a 
noise term: 

m = m + (P r N -1 P)- x P T N -1 n. 

Our code shares this property. In fact, if we produce a map 
from a "signal-only" TOD and sum it to the map produced 
from a "noise-only" TOD the result is virtually indistin- 
guishable (to machine precision) from the map obtained 
by summing the two ( "signal-only" and "noise-only" ) time 
streams. 

5.3. Balloon Borne Experiments 

As already mentioned, our code does not take any ad- 
vantage from the specific Planck scanning strategy. It 
is then straightforwardly applicable to other one-horned 
experiments, such as BOOMERanG (de Bernardis et al. 
2000). Here we do not want to enter in any data analy- 
sis issue. Rather, we want to show the flexibility and the 



Fig. 8. Shown is the percentual difference between the 
r.m.s. of a "noise-only" map obtained with a filter of 
length A/jt and er, the r.m.s. obtained for J\f$ = Afd/2- 
The three lines refer, from top to bottom, to fk — 10 Hz, 
fk = 1 Hz and f k — 0.1 Hz (the Planck goal). 
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Fig. 9. Shown is the percentual difference between noise- 
only CVs obtained for A/j = 2 17 (light blue curve) and 
A/*£ = 2 13 (dark blue curve) w.r.t. the power spectrum 
obtained with the longest possible filter (A/j = Afd/2). 
The red line is the uncertainty due to cosmic variance. 

efficiency of our code to process data coming from an ex- 
periment completely different from Planck. 

As far as the signal is concerned, we simulate a theo- 
retical input map 9 pixelized at 1/3 of the BOOMERanG 
FWHM (~ 10' @ 150 GHz) and properly smeared. 
Knowing its scanning strategy, we extract a typical 
BOOMERanG "signal-only" time stream. As far as the 
noise is concerned, we generate a "noise-only" time stream 
with the noise properties described in Eq.([l5|) and choos- 
ing A — 150 ^Ky^ and fk — 0.07 Hz. In what follows we 
consider a noise filter of length A/j = 2 17 . 

On the very same line of the previous subsections, we 
compare the theoretical input map with rh, which is re- 
constructed at a level of accuracy of 10 -5 when the CG 
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Fig. 10. Upper panel: map-making output on the sim- 
ulated BOOMERanG TOD (signal plus noise). Lower 
panel: residual noise map obtained subtracting the in- 
put map from the one shown above. The maps have 
AC, ~ 6 x 10 5 . 



solver precision is chosen to be 10 -6 . The length of the 
noise filter does not affect, even in this case, the final re- 
sults. 

In the above tests we only consider the 1 degree per sec- 
ond (d.p.s.) section of the BOOMERanG scan. However, 
in order to facilitate comparisons with Planck, we trun- 
cated the BOOMERanG simulated TOD at the length of 
■P60. Not surprisingly, our code runs the BOOMERanG 
map-making in ~ 15 minutes on a 500 MHz Pentium III 
workstation, while the whole 1 d.p.s. BOOMERanG scan 
is processed on the same machine in about 20 minutes. 

Given these time scales, a parallel machine does not 
seem necessary for the analysis of a single channel. 
However, a parallel environment becomes quite appeal- 
ing if one performs a combined analysis of more than one 
receiver. Also, having a parallel implementation may be 
important to perform extensive Monte Carlo simulations 
of the experiment. Specific applications of this code to 
BOOMERanG will be discussed elsewhere. 



6. Summary 

The purpose of this work was to present a parallel im- 
plementation of a map-making algorithm for CMB exper- 
iments. In particular, we have shown for the first time 
Maximum Likelihood, minimum variance maps obtained 
by processing the entire data stream expected from the 
Planck Surveyor, i.e. a TOD covering the full mission 
life span (14 months). Here we restrict ourselves to the 
simple case of the Planck/LFI 30 GHz channel. However, 
the extension of our implemented software to other, higher 
frequency, channels is straightforward. At the moment our 
implementation is limited to a symmetric antenna beam 
profile. 

The code was shown to scale linearly with the number 
of time samples A/d, logarithmically with the noise filter 
length and very slowly with the number of pixels Af p , the 
latter scaling being mostly influenced by the choice of the 
preconditioner. Furthermore, on a multiprocessor environ- 
ment the code scales nearly optimally with the number of 



A key problem for a successful map-making pipeline is 
to obtain a reliable estimate of the noise properties di- 
rectly from flight data. It is straightforward to further 
iterate the GLS solution implemented here to get more 
accurate estimates of the underlying noise. Here we have 
explicitly chosen not to do so. Rather, we stop to the zero- 
th order solution, n, showing that this is more than enough 
for the purposes discussed here. 

The possibility of running map-making algorithms 
on reasonably short time scales opens up the possibil- 
ity of evaluating the CMB angular power spectrum via 
Monte Carlo simulations. We will address this point in a 
forthcoming paper, together with the extension to non- 
symmetric antenna beam profile. 
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